Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing xgboost test in the cudf.pandas third-party integration tests #17616

Open
wants to merge 4 commits into
base: branch-25.02
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Dec 17, 2024

Description

Apart of #17490

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 added bug Something isn't working non-breaking Non-breaking change labels Dec 17, 2024
Copy link

copy-pr-bot bot commented Dec 17, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Dec 17, 2024
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 17, 2024

/ok to test

@Matt711 Matt711 marked this pull request as ready for review December 17, 2024 22:26
@Matt711 Matt711 requested review from a team as code owners December 17, 2024 22:26
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
# After https://github.com/dmlc/xgboost/pull/11014, .inplace_predict()
# returns a real cupy array when called on a cudf.pandas proxy dataframe.
# So we need to ensure we have a valid numpy array.
# TODO: We should remove the call to .get() when .inplace_predict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO something we need to change in cudf.pandas, or is it a bug in xgboost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to make a change in xgboost. Let me cc @galipremsagar to get his thoughts. I believe .inplace_predict() should return a cudf.pandas proxy ndarray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related issue: #17524

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make .inplace_predict() return a cudf.pandas proxy object. Xgboost takes different code-paths for cudf & pandas objects, for a library that executes object(gpu/cpu) aware, I don't think there is much we can do rather than the current state of fixes present in xgboost. But I also feel predict has very old code and it needs to be updated to match inplace_predict behavior. So I opened a pr asking for reviews: dmlc/xgboost#11118

@@ -39,6 +39,7 @@ jobs:
- unit-tests-cudf-pandas
- pandas-tests
- pandas-tests-diff
- third-party-integration-tests-cudf-pandas
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- third-party-integration-tests-cudf-pandas
# TODO: Remove this CI job after https://github.com/rapidsai/cudf/issues/17490 is resolved
# - third-party-integration-tests-cudf-pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants